Introduction: Venous thromboembolism (VTE) in patients with cancer is associated with considerable morbidity, mortality and costs. The first validated VTE score, Khorana Score (KS) (Khorana, Kuderer et al. Blood 2008) was derived from a logistic regression (LR) model based on a large, prospective cohort study of ambulatory patients initiating cancer chemotherapy (ANC Modeling Registry, PI, G Lyman) has been externally validated in numerous observational studies, randomized controlled trials (RCTs) and meta-analyses of RCTs, as well as has been utilized for selecting high-risk patients in RCTs of thromboprophylaxis and integrated into clinical practice guidelines (G Lyman et al, Blood Advances 2021). Limitations of the KS of not being able to identify all CAT patients has prompted efforts to further improve its predictive performance by identifying additional risk factors or through variable shrinkage methods. With the emergent role of artificial intelligence, recent efforts to improve model performance have focused on advanced machine learning (ML) techniques. However, the complexity and risk of bias associated with such approaches remain of concern (G Collins. BMJ 2024).

Methods: A systematic literature review of Medline and Web of Science was undertaken of published reports of the development and validation of ML risk models for VTE among patients with cancer. Two independent investigators extracted data, including study population, cancer type, study design, the ML algorithms, and performance measures associated with each. A formal quality appraisal based on the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) Statement was conducted. The primary study outcome is the measure of model performance for VTE risk prediction: Area Under the ROC Curve (AUC) +/- 95% CI. Heterogeneity among studies was assessed by the Q-test and inconsistency index (I2). A meta-analysis was performed using random effects modeling to estimate pooled AUC values and their variance. Study-level moderators of heterogeneity were assessed through subgroup and sensitivity analyses. Publication bias was assessed based on funnel plots and Eggar's regression intercept.

Results: The initial search identified 1,017 studies of which 10 met a priori defined eligibility criteria, from US (2), EU (2) and Asia (6) between 2016 to 2024, including 6 studies with multiple cancer types and 4 studies limited to lung, gastric, colorectal, or ovarian cancer, respectively. Mean study sample size was 1603 [range; 608-3398] with VTE rates ranging from 3.4% to 30.9%. Multiple ML algorithms were reported with Random Forest in 60% of studies, Support Vector Machine (40%), and Extreme Gradient Boosting (30%) representing the most common. Based on Q-statistic of 4.58 and overall inconsistency index (I2) of 98%, sources of heterogeneity across studies were sought. Pooled model performance based on AUC was 0.79 [95% CI: 0.72-0.85] overall; compared to pooled AUC of 0.81 [95% CI:0.74-0.88] in the 6 studies based on split sample validation versus the AUC of 0.72 [95% CI:0.70-0.73] in the 2 studies with independent external validation datasets (P<0.001), while 2 studies provided insufficient information for analysis. Performance was greater for ML models than KS in the 6 comparative studies (P<0.001), while no difference was observed between models based on ML versus conventional LR in 6 studies (p=0.407). Potential for bias associated with ML models was not discussed in most studies, with only 1 study providing formal quality appraisal based on the TRIPOD Statement. Important factors such as disparities, disabilities, collinearity, missing data and measures to limit overfitting that might account for superior model performance were not reported in most studies. KS assessment was also limited by unavailability of clinical details to correctly calculate or correctly limit the KS analysis to solid tumor or lymphoma patients on chemotherapy. No evidence for publication bias was found.

Conclusion: While there is promise for improved model performance with ML modeling for VTE in patients with cancer, caution is required to avoid significant bias including overinterpretation of such study results. Independent validation by outside groups are essential to fully understand population-level model performance, particularly in key racial and ethnic groups, prior to clinical application.

Disclosures

Kuderer:Astra Zeneca: Consultancy; Janssen: Consultancy; Pfizer: Consultancy; BMS: Consultancy; Beyond Springs: Consultancy; G1 Therapeutics: Consultancy; Sandoz: Consultancy; Seattle Genetics: Consultancy; Fresenious: Consultancy. Lyman:Beyond Spring: Consultancy; Sandoz: Consultancy; G1 Therapeutics: Consultancy; Seattle Genetics: Consultancy; Fresenius Kabi: Consultancy.

This content is only available as a PDF.
Sign in via your Institution